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Abstract — In a group testing scheme, a series of tests are 
designed to identify a small nnmber t of defective items that are 
present among a large nnmher N of items. Each test takes as 
input a group of items and produces a binary output indicating 
whether any defective item is present in the group. In a non¬ 
adaptive scheme the tests have to be designed in one-shot. In 
this setting, designing a testing scheme is equivalent to the 
construction of a disjunct matrix, an M x N binary matrix where 
the union of supports of any t columns does not contain the 
support of any other column. In principle, one wants to have 
such a matrix with minimum possible number M of rows. 

In this paper we consider the scenario where defective items 
are random and follow simple probability distributions. In 
particular we consider the cases where 1) each item can be 
defective independently with probability and 2) each t-set 
of items can be defective with uniform probability. In both cases 
our aim is to design a testing matrix that successfully identifies 
the set of defectives with high probability. Both of these models 
have been studied in the literature before and it is known that 
0{tlogN) tests are necessary as well as sufficient (via random 
coding) in both cases. 

Our main focus is explicit deterministic construction of the 
test matrices amenable to above scenarios. One of the most 
popular ways of constructing test matrices relies on constant- 
weight error-correcting codes and their minimum distance. In 
particular, it is known that codes result in test matrices with 
0{t^ log N) rows that identify any t defectives. We go beyond 
the minimum distance analysis and connect the average distance 
of a constant weight code to the parameters of the resulting test 
matrix. Indeed, we show how distance, a pairwise property of 
the columns of the matrix, translates to a (t-l- l)-wise property of 
the columns. With our relaxed requirements, we show that using 
explicit constant-weight codes (e.g., based on algebraic geometry 


log N \ 

logt 


for 


codes) we may achieve a number of tests equal to 0(t 
both the first and the second cases. While only away by a factor 
of from the optimal number of tests, this is the best set of 
parameters one can obtain from a deterministic construction and 
our main contribution lies in relating the group testing properties 
to average and minimum distances of constant-weight codes. 

Index Terms — Group testing, Dlsjnnct matrices. Constant- 
weight codes. Deterministic construction 


I. Introduction 

Combinatorial search is an old and well-studied problem. 
In the most general form it is assumed that there is a set of 
N elements among which at most t are defective. This set 
of defective items is called the defective set or configuration. 
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To find the defective set, one might test all the elements 
individually for defects, requiring N tests. Intuitively, that 
would be a waste of resource if f <C V. On the other hand, to 
identify the defective configuration it is required to ask at least 
logy]*^g ('^) ~ flog^ yes-no questions. The main objective 
is to identify the defective configuration with a number of tests 
that is as close to this minimum as possible. 

In the group testing problem, a group of elements are tested 
together and if this particular group contains any defective 
element the test result is positive. Based on the test results 
of this kind one identifies (with an efficient algorithm) the 
defective set with minimum possible number of tests. The 
schemes (grouping of elements) can be adaptive, where the 
design of one test may depend on the results of preceding 
tests. For a comprehensive survey of adaptive group testing 
schemes we refer the reader to ca. 

In this paper we are interested in non-adaptive group testing 
schemes; here all the tests are designed together. If the number 
of designed tests is M, then a non-adaptive group testing 
scheme is equivalent to the design of a binary test matrix of 
size M X N where the {i, j)th entry is 1 if the ith test includes 
the jth element; it is 0 otherwise. As the test results, we see 
the Boolean OR of the columns corresponding to the defective 
entries. 

Extensive research has been performed to find out the 
minimum number of required tests M in terms of the number 
of elements N and the maximum number of defective elements 
t. The best known lower bound says that it is necessary to 
have M = 0(j^log(V) tests lfT3l . ifTbl . The existence of 
non-adaptive group testing schemes with M = 0{f^ log A) is 
also known for quite some time ifT^ . ||22|. On the other hand, 
for the adaptive setting, schemes have been constructed with 
as small as 0{t log N) tests, optimal up to a constant factor 

ina, Eli. 

In the literature, many relaxed versions of the group testing 
problem have been studied as well. For example, in El, M 
recovery of a list of items containing the true defectives is 
suggested (list-decoding superimposed codes). This notion was 
revisited in [Sl, ll2^ as list-disjunct matrix and in lfT9l . where 
it was assumed that recovering a large fraction of defective 
elements is sufficient. There are also information-theoretic 
models for the group testing problem where the test results 
can be noisy El (also see a, 0 ). In other versions of the 
group testing problem, a test may carry more than one bit 
of information a, Es, or the test results are threshold- 
based (see a and references therein). Algorithmic aspects 
of the recovery schemes have been studied in several papers. 
For example, papers ll24l and llT^ provide efficient recovery 
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algorithms for non-adaptive group testing. 

Here as well, we consider two relaxed versions of the 
group testing problem - we want recovery to be successful 
with high probability assuming uniform distributions of the 
defective items. In the first scenario, each of the N items can 
be defective with probability This model of defectives, 
called Model 1 throughout the rest of this paper, is as old as 
the group testing problem im and was rigorously defined 
in ll3^ . It is also the subject of very recent works such 
as GT). We provide explicit construction of test matrices 
with 0(f log^ TV/ logf) tests for this situation. In the second 
scenario, we want the recovery to be successful for a very 
large fraction of all possible f-sets as defective configurations. 
This scenario, called Model 2 throughout this paper, was 
considered under the name of weakly separated design in 12 ^ . 
Bbl and lIZTll . It is known (see, HblH that, with this relaxation 
it might be possible to reduce the number of tests to be 
proportional to t log TV. However this result is not constructive. 
Here also we provide explicit construction of test matrices with 
0{t \og^ TV/ log T) tests. Note that, this result is order-optimal 
when t is proportional to for 0 < b < 1 . 

In particular, our result leads to improvement over the 
construction of weakly-separated design from m, whenever 
log TV < (logf)^. In lITSl . the total TV items are partitioned 
and then a nonadaptive scheme for a smaller set of elements 
is repeated on each of the parts. It follows from a simple union 
bound that one would need 0(t log flog TV) tests for both the 
above random models to have high probability identification. 

The repeated-block construction of ifTsll is analogous to 
repeating a good error-correcting code of small length to 
construct a long error-correcting code. Indeed, one can find the 
best linear error-correcting code of length log n and then repeat 
that n/logn times to construct a capacity-achieving code of 
length n. While this can be a first construction, it does not 
give any insight regarding the properties that are important 
for the problem. In an earlier conference version Boll of 
this paper, we showed that the properties of the distribution 
of Hamming distances of the columns of testing matrix can 
play a role in identification. While the result of ll^ leads to 
suboptimal number of tests, we can use better concentration 
inequalities to arrive at improvements over it (see, Theorem|5]). 
Our construction also turns out to give better parameters than 
the repetition scheme of m, whenever log TV < (logf)^. 
Note that, this in particular include the regime where t varies 
as for 0 < (5 < 1 , which is the premise of very recent 
works such as Scarlett and Cevher IIJTII . There is no apparent 
relation to the work of ifTSl with our techniques. In particular, 
our ideas cannot be viewed as an extension of repeated block 
construction. 

We believe that our main contribution lies in 1) relating 
the group testing properties to the average Hamming distance 
between the columns of testing matrix and 2 ) using proper 
classes of explicit codes (such as Algebraic-Geometric codes) 
that satisfy the required properties of average and minimum 
distances. 

Non-adaptive group testing has found applications in mul¬ 
tiple different areas, such as, multi-user communication |[3l, 
ill, DNA screening 133], pattern finding flEj etc. It can be 


observed that in many of these applications it would have 
been still useful to have a scheme that identifies almost all 
different defective configurations if not all possible defective 
configurations. The above relaxations form a parallel of similar 
works in compressive sensing (see, El, ED) where recovery 
of almost all sparse signals from a generic random model is 
considered. 

A construction of group testing schemes from error- 
correcting code matrices and using code concatenation ap¬ 
peared in the seminal paper by Kautz and Singleton 1251 . Code 
concatenation is a way to construct binary codes from codes 
over a larger alphabet l28l . In l25l . the authors concatenate 
a g-ary {q > 2) Reed-Solomon code with a unit-weight code 
to use the resulting codewords as the columns of the testing 
matrix. Recently in l35l . an explicit construction of a scheme 
with M = O(f^logTV) tests is provided. The construction 
of l35l is based on the idea of l25l : instead of the Reed- 
Solomon code, they take a low-rate code that achieves the 
Gilbert-Varshamov bound of coding theory l28l . ES- Papers, 
such as ESI, m, also consider construction of non-adaptive 
group testing schemes. 

In this paper we show that the explicit construction of 
l35l based on error-correcting codes works for both Model 
1 and Model 2 and results in numbers of tests claimed above. 
Not only that, using explicit families of Algebraic-Geometric 
codes in conjunction with Kautz and Singleton construction 
we obtain test-matrices with the same performance guarantee. 

A. Results and organization 

The constructions of l25l . l35l and many others are based on 
constant-weight error-correcting codes, a set of binary vectors 
of same Hamming weight (number of ones). The group-testing 
recovery property relies on the pairwise minimum distance 
between the vectors of the code ES). In this work, we go 
beyond this minimum distance analysis and relate the group¬ 
testing parameters to the average distance of the constant- 
weight code. This allows us to connect the group testing 
matrices designed for random models of defectives to error- 
correcting codes in a general way (see, Thm. |2] and Thm. (3). 
Previously the connection between distances of the code and 
weakly separated designs was only known for the very specific 
family of maximum distance separable codes ED, where 
much more information than the average distance is evident. 

Based on the newfound connection, for both Model 1 and 
Model 2, we construct explicit (constructible deterministically 
in polynomial time) families of non-adaptive group testing 
schemes. This result can be summarized in the following 
informal theorem. 

Theorem 1 (Informal): For both Models 1 and 2, our deter¬ 
ministic nonadaptive scheme can identify the set of defectives 
exactly with probability 1 — e. The sufficient number of tests 
required for this is In - 7 ). 

One of our construction technique is same as the scheme 
of ||25]| . Il35l . however with a finer analysis relying on the 
distance properties of a linear code we are able to achieve 
more. We also use explicit families of Algebraic-Geometric 
codes to obtain the same set of parameters. One of the main 
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contribution is to show a general way to establish a property 
for almost all f-tuples of elements from a set based on the 
mean pairwise statistics of the set. 

In Section |II] we provide the necessary definitions and 
preliminaries. The relation of group testing parameters of 
Model 1 with constant-weight codes is provided in Section 
m In Section |IV] we establish the connection between the 
parameters of a weakly separated design and the average dis¬ 
tance of a constant-weight code. In Section |V] we discuss our 
construction schemes (including one that relies on Algebraic- 
Geometric codes) that work for both of Models 1 and 2. 

II. Basic definitions and properties 

A vector is denoted by bold lowercase letters, such as x, 
and the ith entry of the vector x is denoted by Xi. The 
Hamming distance between two vectors is denoted by d//(•, •). 
The support of a vector x is the set of coordinates where the 
vector has nonzero entries. It is denoted by supp(a;). We use 
the usual set terminology, where a set A contains B if B A. 
Also, below [n] denotes {1, 2,..., n}. 

First of all, we define the following two models for defec¬ 
tives. 

Definition! (Random models of defectives): In the ran¬ 
dom defective Model 1, among a set of N elements, each 
element is independently defective with probability ^. In the 
random defective Model 2, each subset of cardinality f of a 
set of N elements has equal probability (^) of being the 
defective set. 

A. Disjunct matrices 

The following definition of disjunct matrices is standard and 
can be found in Ida Ch. 7]. 

Definition 2: An MxN binary matrix A is called f-disjunct 
if the support of any column is not contained in the union of 
the supports of any other t columns. 

It is not very difficult to see that a f-disjunct matrix gives a 
group testing scheme that identifies any defective set up to size 
t. On the other hand any group testing scheme that identifies 
any defective set up to size t must be a (t— 1 )-disjunct matrix. 
The definition of disjunct matrix can be restated as follows; a 
matrix is f-disjunct if any f-|-l columns indexed by ii,..., it+i 
of the matrix form a sub matrix which must have a row that 
has exactly one 1 in the ijth position and zeros in the other 
positions, for j = l,...,f-|-l. 

To a great advantage, disjunct matrices allow for a simple 
identification algorithm that runs in time 0{Nt), as we see 
below. 

B. Disjunct decoding 

Given the test results y G {0,1}^, we use the following 
recovery algorithm to find the defectives. Suppose, A is the 
test matrix and G {0,1}^, j = 1,..., M denotes the jth 
row of A. The recovery algorithm simply outputs 

[^] supp(a(^^) 

as the set of defectives M Ch. 7]. 


Note that, irrespective of the testing matrix, this algorithm 
will always output a set that contains all the defective ele¬ 
ments. Moreover, if the testing matrix is disjunct, then the 
output is exactly equal to the set of defectives. We have the 
following simple proposition. 

Proposition I: Suppose, the set of defectives is S' C [N], 
Let denote the fcth column of the test matrix A. Then the 
disjunct decoding algorithm recovers the defectives exactly if 
Ujgs supp(aC)) (Joes not contain the support of for all 

tG[N]\ S. 

C. Almost disjunct matrices 

Below we define a relaxed form of disjunct matrices. This 
definition appeared very closely in ll^ . ll46l and exactly in 

(ED. 

Definition 3: For any e > 0, an M x N matrix A is called 
(f, e)-disjunct if the set of f-tuple of columns (of size (^)) 
has a subset B of size at least (1 — e)(^) with the following 
property: for all J G B, UKgjsupp(K) does not contain 
support of any column v ^ J. 

In other words, the union of supports of a randomly and 
uniformly chosen set of t columns from a (f, e)-disjunct 
matrix does not contain the support of any other column with 
probability at least 1 — e. It is clear that for e = 0, the (f, e)- 
disjunct matrices are same as f-disjunct matrices. 

It is easy to see the following fact. 

Proposition 2 (Model 2): A (f, e)-disjunct matrix gives a 
group testing scheme that can identify all but at most a fraction 
e > 0 of all possible defective configurations of size t. 

D. Constant-weight codes 

A binary (M, W, d) code C is a set of size N consisting of 
{0, l}-vectors of length M. Here d is the largest integer such 
that any two vectors (codewords) of C are at least Hamming 
distance d apart, d is called the minimum distance (or distance) 
of C. If all the codewords of C have Hamming weight w, then 
it is called a constant-weight code. In that case we write C is 
an (M, N, d, ruj-constant-weight binary code. 

Constant-weight codes can give constructions of group test¬ 
ing schemes. One just arranges the codewords as the columns 
of the test matrix. Kautz and Singleton proved the following 
in ESI. 

Proposition 3: An (M, N, d, ruj-constant-weight binary 
code provides a f-disjunct matrix where, t = j ■ 

Proof: The intersection of supports of any two columns 
has size at most w — d/2. Hence if w > t{w — d/2), support 
of any column will not be contained in the union of supports 
of any t other columns. ■ 

Extensions of Prop. [3 are our main results. To do that we need 
to define the average distance D of a code C: 

' ' yeC 

Here dH{x,y) denotes the Hamming distance between x and 
y. Also define the second-moment of distance distribution; 

D2{C) = iT^ XI dH{x,y)^. 
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III. Model 1: independent defectives - test 

MATRICES FROM CONSTANT-WEIGHT CODES 

In this section, we consider the independent failure model 
(Model 1) and show how the minimum and average distances 
of a constant-weight binary code contribute to a nonadaptive 
group testing scheme. Recall, in this model we assume that 
among N items, each is defective with a probability The 
main result of this section is the following theorem. 

Theorem 2 (Model 1): Suppose, we have a constant-weight 
binary code C of size N, minimum distance d and average 
distance D such that every codeword has length M and weight 
w. The test matrix obtained from the code exactly identifies 
all the defective items (chosen according to Model 1) with 
probability at least 1 — e (over the probability space of Model 
l)if 

j 3(^w — t{w — 0/2)^ 


w--< 


2(2t{w — D/2) -I- In 


( 1 ) 


N 


dn , Cj ) 


i:(“ 

J 6 S 


< W. 


Hence the condition of the lemma is sufficient for success. 
Now we are ready to prove Thm. |2] 

Proof of Thm. \2} First of all, by union bound. 


N 

Pr(3i G[iV]: ^ X,(i 


N 


< 




d/H 5 ) 

2 

dfi ( j ) 


)>«,) 


> ru . 


(1 - t/N){w - d/2) and HajXj - ajt/N)^ = ^ (l “ 

if) 


We have, 


N 

P'( E -v( 


w — 


d/ti (cj, Cj ) 




)>») 


= Pr ( ^ {Xj - t/N)aj 
j='^d¥=i 

[a) ^ t ^ 

^ P’’ ( 


i=i 


f=i 


where (a) is true as the event within the probability in second 
line implies the event in the third line. 

Now, we can use the classical Bernstein concentration 
inequality (see the version we use in ll^ Thm. 2.7]), to have. 


N 


We will need the help of the following lemma to prove the 
theorem. Note, from Prop. [T] the disjunct-recovery algorithm 
will be successful if the union of supports of the columns 
corresponding to the defectives does not contain the support 
of any other columns. Suppose the testing matrix is constructed 
from an (M, N, d, w)-constant-weight code C (each column is 
a codeword). Let 

C = {Ci, C2 , ..., Ctv}. 

Moreover, assume Xj G {0,1} is the indicator Bernoulli(f/A^) 
random variable that denotes whether the )th element is 
defective or not. 

Lemma 1: Suppose, for all i G [N], we have 


-kP,( ^ 


i=i 


> 




j 


Y "'2 

j=3,j¥=i 

Then the disjunct-recovery algorithm will exactly identify the 
defective elements. 

Proof: The lemma directly follows from Prop. [T] and the 
fact that for any i,j, w — nonnegative. Suppose 

S C [N] be the random set of defectives. The disjunct- 
recovery algorithm will be successful when for all i G [N] \ S, 

dnici, Cj)' 


> 


> 


> 


i/^ ~ Tf 

E, s' + 5 ^ Y «j)) 

{w-if Ej s) 

Ej aj{w -i) + i{w-i)(w-x 

3(w^- IvEj s) 


(b) 

> 


3(jv — t(w — D/2)^ 


2 (w-f) (2t(w-D/2)+w^ 


and (b) follows because the exponent above is an increasing 
function of E: S jjEf S = “ W E, (ct, s) < 


w—Y' Now using union bound, we deduce that the test matrix 
will successfully identify the defective elements exactly with 
probability 1 — e if 

i[w-t{w - D/2)'^ 


2(w-f^(2t(w-D/2)+w^ 


> In ■ 


For a fixed i, we would want to upper bound the probability 
above in the right hand side under the summation. Assume, 
Notice, ajXj — '¥j(ajXj) < aj{\ — t/N) < 


which proves the theorem. ■ 

Similar result can be obtained for Model 2. However, 
because of the dependence among the random choice of de¬ 
fectives we need to use concentration inequalities for sampling 
without replacements. 

IV. Model 2: (f, e )-disjunct matrices from 

CONSTANT-WEIGHT CODES 
Our main result of this section is the following. 
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Theorem 3 (Model 2): Suppose, we have a constant-weight 
binary code C of size N, minimum distance d and average 
distance D such that every codeword has length M and weight 
w. The test matrix obtained from the code is (f, e)-disjunct for 
the largest t such that. 


d>D- 


K 

w — t{w — 0/2)^ 

^2 

In^ 

e 

(2t(w - D/2) + 

luj 


holds. 

One can compare the results of Prop. [3 and Theorem [3] to 
see the improvement achieved as we relax the definition of 
disjunct matrices. Indeed, Theorem |3] implies. 


w — \J\{D — d) lii^(2t(w — D/2) + w) 

^ - W-DI2 ’ 

as opposed to f < ^Sd ^/2 from Prop. [3] This will lead to 
the final improvement on the parameters of Porat-Rothschild 
construction llTSll . as we will see in Section IV] 


A. Proof of Theorem |5] 

This section is dedicated to the proof of Theorem |3] 
Suppose, we have a constant-weight binary code C of size 
N and minimum distance d such that every codeword has 
length M and weight w. Let the average distance of the code 
be D. Note that this code is fixed: we will prove the almost- 
disjuctness property of this code. 

Let us now choose t codewords randomly and uniformly 
from all possible (^) choices. Let the randomly chosen 
codewords be {ci, C 2 ,..., q}. In what follows, we adapt the 
proof of Prop. |3]in a probabilistic setting. 

Assume we call the random set of defectives as S. For 
I G [A^] \ S, define the random variables Z‘ = ~ 

dH(ci,cj) ^ Clearly, is the maximum possible size of the 
portion of the support of q that is common to at least one 
of Cj,j = 1,... ,t. Note that the size of support of c; is w. 
Hence, as we have seen in the proof of Prop. [3 if Z^ is less 
than w for all Ts that are not part of the defective set, then 
the disjunct decoding algorithm will be successful. Therefore, 
we aim to find the probability Pr(3Z G [N] \S : Z'‘ >w) and 
show it to be bounded above by e under the condition of the 
theorem. 

As the variable Z^s are identically distributed, using union 
bound, 

PifBl g[N]\S : Z^ >w)<{N-t) Pr(Z' > w), 

where I can now assumed to be uniformly distributed in [N] \ 
S. In the following, we will find an upper bound on Pr(Z^ > 

w). 

In 1^ . an upper bound on Pr(Z^ > w) was found by 
Azuma’s inequality. It turns out that by using a trick from 
Hoeffding ll20l . and using the Bernstein inequality we can 
achieve a tighter bound. First note that. 


Z' = 


I 

n 

i=i 


w — 


dH{ci,Cj) 


where, Ci,..., c^, C; are randomly and uniformly chosen (t - 

(*"i) 


1 ) codewords from all possible choices. 


Given, ci, the other codewords are randomly sampled from 
the code without replacement. It follows from ll^ Theorem 
4], for any real number s that. 


E 


/ , \ / I I , 

I ci^ < """V ^ / | 


where Xi,...,Xt are codewords randomly and uniformly 
sampled from the code with replacement. Therefore, 

where ci is a randomly and uniformly chosen codeword and 
Xi,... ,Xt are codewords randomly and uniformly sampled 
from the code C \ {c;} with replacement. 

Therefore, for any s > 0, using Markov inequality. 


Pr 


(z' >w^< Ee"'*“ , 


where, Yi = w — j = are independent 

random variables with. 


¥Yi < w — 


D 




(tu — — 

(w — - 

V 2 ) 

V 2/ 


and 


since is a nonnegative random variable. Now, since l^s are 
all independent, we can use ll^ Thm. 2.7] (or its method of 
proof) again, to upper bound large deviation for the sum ZK 
Indeed, we must have, 

.- . EK-^^s 

.4 




where. 


A = 2J2i'^Y^^ - 

2 . d D, 

+ 3 - X! -^-wA—) 

i—1 

< 2t(w - ^){w - - 2 f(u; - y )2 

+ ^{w-t{w- ^)){D-d) 

= - y)(L» -d) + - t{w - ^)){D - d). 


Hence, we have. 


Pr 


(^Z’’ >w^< exp ^ 


2>{w — t{w — y))^ 
{2t{w — -j) + w)(D — d) 


y 


Now using union bound, we deduce that the test matrix 
will successfully identify the defective elements exactly with 
probability 1 — e if 

3(w -t{w - D/2)\ ^ 

7- \7 -7 - 

m - d\ (2t(w - D/2) + wj ^ 

which proves the theorem. 
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B. Higher order statistics of distance distribution 

We get slightly tighter bounds in both the Theorems |2] and 
13 if higher order than only the average distance of the codes 
have been considered. Indeed in both of the main theorems 
we have used the inequality, 


1 


dnix^y) 2 ^ , d 1 

y: (»— —) < 

y.x&C ' ' 


y.xGC 


dH{x,y) 


since w — is always nonnegative. However both of the 

theorems could be rephrased in terms of the second-moment 
of the distance distribution. For example, Theorem [3 can be 
restated with slightly stronger result. 

Theorem 4: Suppose, we have a constant-weight 
(M, iV, d, w) binary code C with average distance D 
and the second-moment of the distance distribution £> 2 . The 
test matrix obtained from the code is (i, e)-disjunct for the 
largest t such that. 


d> D + 


3t{D2-D^) ^(w-t{w-D/2)^ 


2(^w — t{w — D/2)^ 


In 


N 


( 2 ) 


2{q — logg N). Therefore, from Prop. [3 we have a ^-disjunct 
matrix with, 

q-1-1 _ q-2 

q-l-q + logg N \ogg TV - 1 
^ qlogq ^ a/m log AT 
log TV 2 log TV 

On the other hand, note that, the average distance of the 
RS code is {q — 1)(1 — 1/q). Hence the average distance 
of the resulting constant-weight code from Kautz-Singleton 
construction will be 

9 

Now, substituting these values in Theorem [3 we have a (T, e) 
disjunct matrix, where, 

2 ( 9 -log, TV) 

3(g-l-t(g-l- 

q ( 2 i( 9 -l- (^)+ 9 -l)lnf 
2 ( 9 - 1 )^ 3{q-l){l-t/qr 

9 (1 + 2t/q) In f 


holds. 

We omit the proof of this theorem as it is exactly same as the 
proof of Theorem [3 

However, it turns out (in the next section) that our results, 
that rely only on the average distance, are sufficient to give 
near-optimal performance in group testing schemes in terms 
of the number of tests. In particular, use of (??) in conjunction 
with the construction of constant-weight codes below, instead 
of Theorem [3 leads to improvement only on the constant 
terms. 

V. Construction 

A. Discussions 

As we have seen in Section HIl constant-weight codes can be 
used to produce disjunct matrices. Kautz and Singleton ll 25 l 
gives a construction of constant-weight codes that results in 
good disjunct matrices. In their construction, they start with 
a Reed-Solomon (RS) code, a 9-ary error-correcting code of 
length 9 — 1. For a detailed discussion of RS codes we refer the 
reader to the standard textbooks of coding theory ll 28 ll . 1361 . 
Next they replace the 9-ary symbols in the codewords by unit 
weight binary vectors of length 9. The mapping from 9-ary 
symbols to length-9 unit weight binary vectors is bijective; 
i.e., it is 0 ^ 100 ... 0; 1 ^ 010 ... 0;...; 9 - 1 ^ 0 ... 01. 
We refer to this mapping as <j). As a result, one obtains a set 
of binary vectors of length 9(9 — 1) and constant-weight 9. 
The size of the resulting binary code is same as the size of 
the RS code, and the distance of the binary code is twice that 
of the distance of the RS code. 

For a 9-ary RS code of size TV and length 9— 1 , the minimum 
distance is 9 — 1 — log, TV -b 1 = 9 — log, TV. Hence, the 
Kautz-Singleton construction is a constant-weight code with 
length M = 9(9 — 1), weight w = q — 1 , size TV and distance 


This basically restricts t to be about 0{'/M) (since, 1 — t/q 
must be nonnegative). Hence, Theorem [3 does not obtain any 
meaningful improvement from the Kautz-Singleton construc¬ 
tion in the asymptotics except in special cases. 

There are two places where the Kautz-Singleton construc¬ 
tion can be modified; 1) instead of Reed-Solomon code one 
can use any other 9 -ary code of different length, and 2 ) instead 
of the mapping f any binary constant-weight code of size 9 
might have been used. For a general discussion we refer the 
reader to lfT3 §7.4]. In the recent work llTSl . the mapping f is 
kept the same, while the RS code has been changed to a 9 -ary 
code that achieve the Gilbert-Varshamov bound ||28l, 1^ . 

In our construction of disjunct matrices we use the Kautz- 
Singleton construction and instead of Reed-Solomon code ei¬ 
ther 1) follow the footsteps of llT5l to use a Gilbert-Varshamov 
code or 2) use Algebraic-Geometric codes. We exploit some 
property of the resulting scheme (namely, the average distance) 
and do a finer analysis that was absent from the previous works 
such as 1 ^ . 

B. q-ary Gilbert-Varshamov construction 

Next, we construct a linear 9 -ary code of size TV, length Mg 
and minimum distance d, that achieves the Gilbert-Varshamov 
(GV) bound ll28l . |[36l. We describe the bound in Appendix 

El 

Porat and Rothschild llTSl show that it is possible to con¬ 
struct in time 0(M,TV) a 9 -ary code that achieves the GV 
bound. To have such construction, they exploit the following 
well-known fact; a 9 -ary linear code with random generator 
matrix achieves the GV bound with high probability ll^ . To 
have an explicit construction of such codes, a derandomization 
method known as the method of conditional expectation 11 is 
used. In this method, the entries of the generator matrix of the 
code are chosen one-by-one so that the minimum distance of 
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the resulting code does not go below the value prescribed by 
Eq. (??). For a detail description of the procedure, see llSl . 

Using the GV code construction of Porat and Rothschild 
and plugging it in the Kautz-Singleton construction above, we 
have the following proposition. 

Proposition 4: Let s < q. There exists a polynomial 
time constructible family of {M,N,2M/q{l — l/s),M/q)- 
constant-weight binary code that satisfy. 


M/q< 


slnN 

ln(q/s) - 1' 


(3) 


Although the proof of the above proposition is essentially in 
Porat and Rothschild lEa, we have a cleaner proof that we 
include in Appendix lAl for completeness. 

However, we are also concerned with the average distance 
of the code. Indeed, we have the following proposition. 

Proposition 5: The average distance of the code constructed 
in Prop. ID is 


2M, 

D=—il-l/q). 


Proof: For Prop. |4] we have followed the Kautz-Singleton 
construction. We take a linear g-ary code C of length Mq = 
size N and minimum distance dq = ^. Each q-ary symbol 
in the codewords is then replaced with a binary indicator vector 
of length q (i.e., the binary vector whose all entries are zero but 
one entry, which is 1) according to the map f. As a result we 
have a binary code C of length M and size N. The minimum 
distance of the code is d and the codewords are of constant- 
weight w = Mq = The average distance of this code is 
twice the average distance of the q-ary code. As C is linear 
(assuming it has no all-zero coordinate), it has average distance 
equal to 


IVlq 


7=0 ^ ^ 

Mq{l-l/q), 


or when. 


d/2 > M/q - 


3M/q(l - t/q^ 
2(2t/q+l^ Inf 


(5) 


Hence a sufficient condition is to chose the constant-weight 
code such that. 


d> 


2M 


(l 

3| 

(1 - (/, 


^ 2| 

(21/ g + 

In^^ 

€ 


We can take q to be the smallest power of prime that is greater 
than 2t. Which will make the sufficient condition look like, 

3 

g V 161n ^ 

However, according to Prop. |4] such code can be explicitly 
constructed with. 


, 2M / 

d> -1 

a V 


1 - 




M/q < 


16/3Inf In A^ 
ln(3f/(161nf)) - 1 


( 6 ) 


Hence, the sufficient number of tests is M = In W In 


N 


D. Construction of almost disjunct matrix: Model 2 

We again follow the above code construction and choose q 
to be a power of a prime number. With proper parameters we 
can have a disjunct matrix with the following property. 

Theorem 5: It is possible to explicitly construct a (f, e)- 
disjunct matrix of size M x N where 

M = o(^logiVlog^). 

Proof: We follow the Kautz-Singleton code construction 
as earlier. That is we have a (M, W, d, M/q)- constant-weight 
code that satisfies Prop. 0] and |5] Hence, average distance D = 
— 1/q). The resulting matrix will be (f, e)-disjunct if the 
condition of Theorem [3] is satisfied, i.e.. 


where Aj is the number of codewords of weight j in C . Here 
we use the fact that the average of the distance between any 
two randomly chosen codewords of a nontrivial linear code is 
equal to that of a binomial random variable Il28l . Hence the 
constant-weight code C has average distance D = 2Mq{l — 
l/q). ■ 

C. Constructions for Model 1 

We follow the Kautz-Singleton code construction. Suppose, 
we have a (M, N, d, M/q)- constant-weight code that satisfies 
Prop. 0] and |5] Hence, average distance D = ^(1 — l/g). 
The resulting test matrix will satisfy the condition of Thm. 
when. 


d/2 > M/q 


3| 

(M/g-f(M/g-M/g(l-l/g)) 

)’ 

2{2t{M/q - M/g(l - 1/g)) -f M/g) 

e 


(4) 


d>—{l-l/q) 

q 


2,{M/q - t{M/q - M/q{l - l/q))'^ 
(2t{M/q - M/q{l - l/q)) + M/q^ In f ’ 


or when. 


2M 3M/q(l-t/q) 

d > —(1 - l/q) - ^ f ■ 

9 [21/q + ij In f 


(7) 


Hence a sufficient condition is to choose the constant-weight 
code such that. 


d > 


2M / 1 

a \ a 


3| 

(1 - t/q 


2(21 /q+l^ 

e 


Since the requirement of above sufficient condition is slightly 
weaker than that of (??), we can still choose g to be a 












smallest power of prime that is greater than 2t, and follow 
the calculations for Model 1, to obtain total number of tests 

■ 

It is clear from Prop. |2] that a (f, e) disjunct matrix is 
equivalent to a group testing scheme. Hence, as a consequence 
of Theorem |5] we will be able to construct a testing scheme 
with InTVln tests. Whenever the defect-model is 

such that all the possible defective sets of size t are equally 
likely and there are no more than t defective elements, the 
above group testing scheme will be successful with probability 
at lease 1 — e. 

Note that, if t is proportional to any positive power of N, 
then logA^ and logf are of same order. Hence it will be 
possible to have the above testing scheme with O(flogY) 
tests, for any e > 0. 


E. Constructions based on Algebraic-Geometric codes 

Now, instead of using the Porat-Rothschild construction of 
GV codes, we can use the Algebraic-Geometric (AG) code 
construction of Tsfasman, VladuJ and Zink ED. In particular, 
we can base our construction on the Garcia-Stichtenoth Tower 
of function field over Il42l Sec. 3.4.3]. 

Assume, q = r^, where r is any integer. For any even 
number n, there is a family of modular curves with genus gn = 
(r”/^_l)^ with number of points given by > r"+^—r"-|-l 
(see, ll42l Theorem 3.4.44]). Now, using Corollary 4.1.14 of 
Ea, we conclude that it is possible to construct families of 
linear code of length Mg, size N and minimum distance dq, 
where. 

Mg > - r" -b 1, 


and 

logg N = Mg - dq - g„ 1. 


Hence, we obtain families of linear code such that, 
logg^ > 1 _ ^ _ 1 

Mg ~ Mg y/q-I' 


( 8 ) 


Now, using the Kautz-Singleton mechanism of converting this 
to a binary code, we obtain an (M, N, d, w) constant weight 
code, where 


M = qMg] d = 2dq] w = Mg = M/q, 


and. 


~ q \ M 




(9) 


Since, the AG code is a linear code, we can calculate the 
the average distance of the above constant-weight code as in 
Proposition ID Indeed, the average distance D = ^(1 ~ ^). 

For this family of codes, we can also calculate the second- 
moment of the distance distributiorQ, that allows us to use 
Theorem 0] To be consistent of the rest of the paper, we 
rely on only the average distance, and use Theorem [3] instead. 


*It turns out that D 2 = 4^(1 - 4). 

V q/ 


Substituting the values of D,w in Theorem [D we obtain the 
following. If 



then the construction is (f, e)-almost disjunct. Comparing (??) 
and (??), we claim that, our construction is (f, e)-almost 
disjunct as long as, 

I 1 . 1 , 

^ d 2(2f/g + l) Inf' 

Now assuming q to be the smallest power of 2 greater than 
2t, we see that the above condition is satisfied when. 


16flnAf, N 

M > — - In—. 

In 2t e 

We should note that construction for Model 1 can be done in 
the exact same way to obtain the same parameters. 


Remark 1: (The traditional argument (Prop. 0) with 
Algebraic-Geometric Codes) Note that, one could use AG 
codes in conjunction with Prop. [3] to obtain disjunct matrices. 
However such a construction results in highly suboptimal 
number of rows (tests). Indeed, substituting Eq. (??) in Prop.[3 
we have a ^-disjunct matrix with. 


1 

9 log, I 1 

M + V9-1 


qloggN _ 1 _ 1 

M t y/q — 1 


Hence, to get anything nontrivial we must have q>t^, which 
results in M = Cl(t^ log W/ logt). This is quite bad compared 
to the optimal constructions that give disjunct matrices with 
0{t^ logW) rows. It is interesting that by using our average 
distance based arguments we are able to get rid of such 
suboptimality with AG codes. Intuitively, while the range 
of minimum distance of the constant-weight codes obtained 
from the AG codes is not sufficient for optimal results, the 
combination of average distance and minimum distance for 
these codes indeed belongs to the best possible region. 


VI. Conclusion 

In this work we show that it is possible to construct non- 
adaptive group testing schemes with small number of tests that 
identify a uniformly chosen random defective configuration 
with high probability. To construct a f-disjunct matrix one 
starts with the simple relation between the minimum distance 
d of a constant w-weight code and t. This is an example of 
a scenario where a pairwise property (i.e., distance) of the 
elements of a set is translated into a property of f-tuples. 

Our method of analysis provides a general way to prove that 
a property holds for almost all f-tuples of elements from a set 
based on the mean pairwise statistics of the set. Our method 
might be useful in many areas of applied combinatorics, such 
as digital fingerprinting or design of key-distribution schemes, 
where such a translation is evident. With this method potential 
new results may be obtained for the cases of cover-free codes 
lfT4l . Il25l . EOl . traceability and frameproof codes ifTOl . 
















9 


Appendix 

A. Gilbert-Varshamov bound and proof of Prop. 0 

Lemma 2 (Gilbert-Varshamov Bound): There exists an 
{m,N,d)q-code such that, 

m 

N> ^ , -. (12) 

Eto(T)(9-i)* 

Corollary 1: Suppose A" is a Binomial(m, \ — ^) random 
variable. There exists an {m,N,d)q-code such that, 

N > - - - 

- Pr(X < d) 

Lemma 3: Suppose X is a Binomial(m, 1 — |) random 
variable. Then, for all s < q, 

Pr (x < m(l - i)) < (13) 

where D{p\\p') =p\n{p/p') + (1 -p)ln((l - p)/{l - p')). 

Theorem 6: Let s < q. For the {m, N,m{l — l/s))q-code 
that achieves the Gilbert-Varshamov bound, we have 


m < 


s liiiV 


(14) 


ln((?/s) - 1 ■ 

Proof: This theorem follows from corollary [T] and lemma 
[3 Note that, 

D{l/s\\\/q) = iln- + [l - In [l - 

s s \ s/ \ s/ 


> 


S S \ 5/ v s/ 


s s s 

where in the last line we have used the fact that a; In a; > a; — 1 
for all a; > 0. ■ 

Using the Kautz-Singleton construction, this implies that, 
there exists a polynomial time constructible family of 
(M, N,2M/q{l — 1/s), M/g)-constant-weight binary code 
with. 


M/q < 


sin TV 

ln(( 7 /s) - 1 ’ 


which is Prop. |4] 
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